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Abstract 

We examine nonparametric dose-finding designs that use toxicity es- 
timates based on all available data at each dose allocation decision. We 
prove that one such design family, called here "interval design" , converges 
almost surely to the maximum tolerated dose (MTD), if the MTD is the 
only dose level whose toxicity rate falls within the pre-specified inter- 
val around the desired target rate. Another nonparametric family, called 
"point design", has a positive probability of not converging. In a nu- 
merical sensitivity study, a diverse sample of dose-toxicity scenarios was 
randomly generated. On this sample, the "interval design" convergence 
conditions are met far more often than the conditions for one-parameter 
design convergence (the Shen-O'Quigley conditions), suggesting that the 
interval-design conditions are less restrictive. Implications of these the- 
oretical and numerical results for small-sample behavior of the designs, 
and for future research, are discussed. 
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1 Introduction 

Dose-finding designs attempt to identify the dose for which only a given fraction 
p of the population experiences some adverse (e.g., toxic) response. This dose is 
often called the experiment's "target", and can be symbolically denoted F~^{p) 
where F{x) is an adverse- response-rate curve, monotonically increasing with 
the dose strength x. In practice, it is more common to seek the dose closest 
to target from among a pre-specified fixed set of dose levels. This is known 
as the maximum tolerated dose (MTD). Dose- finding designs self-correct the 
dose allocation, according to hitherto observed outcomes, and thus belong to 
the family of sequential designs. 



Some dose-finding designs, known as "rule-based" or "memoryless" (|0'Quiglev and Zohar 



20061) . are characterized by fixed dose-transition rules based on a limited sub- 
set of available observations (usually the most recent ones), and without any 
assumptions ab out the dose- toxicity curve F. A prominent example is the 



'3-1-3' protocol ([Carter 



1973I ). used for the vast majority of Phase I cancer 
trials. We will refer to this class as "short-memory" designs. Another, re- 
cently popular approach, is called "model-based" or "designs with memory" . 
Such designs incorporate a model for F, and allocate doses via an estimation 
procedure based on all available observations. We will call these designs "long- 
memory" . The overwhelming majority of novel dose-finding designs appearing 



in recent literature are l ong- memory , with Bayesian designs taking center stage 



( O'Quiglev et al. 



1990; 



Babb et al 



19981 ). In Bayesian designs a parametric 



model curve G (x, 6, (f) substitutes for F. with 6 denoting data-estimable pa- 
rameters and (j) fixed prior parameters. In the most common implementation, 
the next cohort is chosen according to where G {^,9,<pj crosses the horizontal 
line y = p (Figure [TJ left). 



Designs that do not clearly belong to the short-me mory or 



types , have also been sugg ested. These include two-stage (IStorei 



20021 ) and hybrid designs (jlvanova et al 



2003 



Oron 



long 



2001 



memory 



Potter 



20071 Section 5.3). Yet 



another intermediate approach suggests using all available data to estimate 
i.e., it is long-memory, but avoids parametric or Bayesian model specif ication 



(jLeung and Wane . 



2001 



Yuan and Chappelll . 



2004 : 



Ivanova et al 



20071) . This 



nonparametric long-memory design type is the subject of our article. 

We focus on convergence of these designs. The term "convergence" applied 
to dose-finding does not usually refer to convergence of our estimate of F] the 
point estimates of F at the doses are guaranteed to converge almost surely to 
their true value in the limit of infinite sample size, as will be proven below in 
Section [2l Rather, convergence in the dose-finding context refers to allocation 
convergence: the convergence of the sequence of allocated doses to some sta- 
tionary pattern. Short-m emory designs belonging to the up-and-down family 



( Dixon and Mood 



19481) generate Markov chains of doses, converging at a ge- 
ometric rate to a stationary random walk whose dose-allocation distribution is 
centered close to target. The properties of up-and-down designs can be analyzed 



using standard Markov chain theory ( 


Derman 


1994; 


Gezmu . 


1996; 


Gezmu and Flournov, 


2006 



1957 




Durham and Flournov 


Oron and Hofi 




2009 


). As to 



long-memory designs, proofs of allocation convergence are few and far between. 
In fact, nearly all of the novel long- memory designs ~ and dozens of them have 
been put forth since 1990 - lack a convergence proof. 

To date, we are aware of the following published long-memory convergence 
proofs: 



• Shen and O'Quiglc MShen and O'Quiglevi (|l996l) proved that the one-parameter 
frequentist analogue to the CRM design converges almost surely (at a root- 
n rate) to the MTD, a notion that will be defined in Section [5J This result 
is widely perceived as a generic convergenc e- under-misspecificatio n proo f 



Cheung and Chappelll (|2002l) 



for CRM. However, Cheung and Chappell 
demonstrate that in fact the proof requires rather restrictive conditions 
This will be explored in more detail in Section [5] 



Zacks et al. 



Zacksetal 



(|l998f ) present a similar result, under different 



and arguably even tighter restrictions on the form of F. 



• Ivanova et al. Ilvanova et al.l (j2003l ) prove that a hybrid 1950's design 
attributed to Narayana converges to a two-level random walk around the 
MTD; ho wever , for r easons probably related to undesirable early-stage 



behavior ( Oron 



20051) ■ this design has not been mentioned since then. 



None of these proofs applies to the nonparametric long-memory designs we 
examine here. 



Mathew Et Al. (2004), via CRM Mathew Et Al. (2004), via 'Point' lUlathew Et Al. (2004), via 'Interval' 



X oo _ X „ _ X 



Dose (mg/sq.m./wk) 



Dose (mg/sq.m./wk) 



Dose (mg/sq.m./wk) 



Figure 1: Demonstration of the parametric CRM design (left), and the nonpara- 
metric point (center) and interval (right) design s, using data fr o m the Mathew 
et al. 2004 experiment targeting 30% toxicity ( Mathew et all 12004 ). Shown 
is the situation after Cohort 3, administered at 35 mg/m^/week. Dose spacing 
was uniform at 5 mg/m^/week. CRM (left) would allocate to Cohort 4 the 
dose closest to where the posterior model curve (solid line) crosses the dashed 
horizontal y — 0.3 line - i.e., 25 mg/m^/week. The 'X' marks denote the ac- 
tual observed toxicity rates. The point design (center) would follow the same 
principle, but using a nonparametric curve obtained via isotonic-regression in- 
terpolation; hence, Cohort 4 would receive 30 mg/m^/week. The interval design 
(right) looks only at the actual toxicity rate at the dose Cohort 3 received ('X' 
mark). Since it falls above the interval marked by two dashed horizontal lines, 
the allocation will de-escalate one level to 30 mg/m^/week. 



2 Preliminaries 



2.1 The Designs 

We describe the dose-finding problem via a latent-variable model: let Y{x) ^ 
Bernoulli {F{x)) be a binary toxicity response of some dose strength x, with 
the toxicity-ratc function F{x) strictly monotone increasing but not directly 
observable. The overall goal is to estimate the target Qp = F^^{p), which 
can be seen as the lOOp-th percentile of F if one thinks of F as a cumu- 
lative distribution function of toxicity thresholds. Consider a sequential de- 
sign treating kj > 1 subjects at cohort j, j ~ 1,2,..., with the value of 
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the allocated-dose r.v. Xj taken out of a set of m predetermined dose lev- 
els 51 = {di, 6,2, ■■ ■ dm '■ di < d2 < ■ ■ ■ < d„i}- For simplicity and without loss of 
generality with respect to our proofs, from here on we assume that all cohorts 
are of size 1, and index successive treatments as Xi,i — 1, . . . n, . . . The toxicity 
responses Yi are assumed independent given the Xi. 

As mentioned in the introduction, rather than precisely estimate Qp re- 
searchers are often content with identifying the MTD, i.e., the dose level closest 
to Qp according to some distance criterion; we will denote the MTD as G 51. 
In this article we assume that distance on the response scale is used to find the 
MTD; in other words, u* = argmin]^<„<,„ \F (d^) — p\- 

All long-memory designs use the raw toxicity frequencies, which can be seen 
as Binomial point estimates of F given n observations. 



F„K)^^fei^li^i-^; u = l,...m, (1) 



where yi is the binary toxicity outcome (0 or 1). While parametric designs 
use the F values indirecty as inputs to the calculation of 0, nonparamctric 
long-memory designs use them directly, with a possibl e modification to ensur e 



19881) 



monotonicity of the F values using standard methods ([Robertson et al 

Following are the definitions of the "point" and "interval" nonparametric 
designs. 



Definition 1 (i) A "point-based nonparametric long-memory" Phase I design 
(hereafter, "point design") starts at an arbitrary dose. At each subsequent step, 
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the design allocates the next cohort to the level whose (possibly monotonized) 
F (du) value is closest to p. If the highest dose for which such an estimate is 
available maintains F (d^) < p, the experiment escalates to du+i - boundary 
permitting - and vice versa. 

(ii) An "interval-based nonparametric long-memory" Phase I design (here- 
after, "interval design") starts at an arbitrary dose. At each subsequent step, the 
design compares a (possibly monotonized) F[du), with d^ being the currently- 
administered dose, to the interval (p — Api,p + Ap2) with Api > 0, Ap2 > 
predetermined constants. If 

F{du) e ip-Api,p + Ap2), 

du will be administered again. If F [du) < P~ Api, d^+i will be administered 
(unless u =: m in which case dm will be administered again), and vice versa if 
F{du) >p + Ap2. 

For both design types, the recommended MTD is the next dose level that 
would have been allocated at the experiment's end, had another cohort been run. 



The point design was suggested by Leung and Wang lLeung and Wang! (j200l[ ): 
it is a direct variation on parametric designs sueh as CRM, with the parametric 
curve F(0) replaced by a monotone nonparametric interpolation of F between 
dose levels (Figure [U center). The interval design's principle is different; one 
might call it "narrow long-memory" since the allocation decision is based on 
prior outcomes at the current dose only (Figure [U right). Rather than look 



for some optimal dose at each cohort, the ahocation would repeat the existing 
dose as long as the toxicity frequency at that dose falls within the interval. 
Different versions o f the i nterval design wer e put forth by Yuan and Chappell 



Yuan and Chappelll (|2004[ ) and Ivanova et al 



Ivanova et al 



(120071 ). The interval 



design does not allow for skipping dose levels between consecutive cohorts. 



2.2 Allocation Convergence 

We now clarify the meaning of allocation convergence using the terminology 
introduced above. The sample space for allocation convergence is the space of 
all permissible infinite sequences of assigned doses, which is a subset of 
(usually subject to the the constraint of no dose skipping). Each design induces 
a probability distribution on sequences in this sample space (probabilities of 
finite subsequences can be exactly calculated, with knowledge of the design's 
rules and of F values at the doses). Almost sure convergence to the MTD 
means that sequences ending with infinite and uninterrupted repetitions of 
have a combined probability of 1. 

On this sample space, define the random set 

§ = {u : riu — > oo as ri — oo} , (2) 

where is the number of subjects assigned to d^. In words, § is the 
set of indices for levels appearing an infinite number of times in the sequence. 
Obviously § is nonempty for all sequences in the sample space, all being infinite. 
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Moreover, since the interval design does not allow for dose skipping § must be 
connected, i.e., composed of consecutive levels. Thus, the value of S for different 
sequences in the sample space can be described via an ordered pair of integer 
random variables Si < S2 '■ S — Si, ... 82- 

We end the preliminaries with the following point-estimate convergence re- 
sult, which holds regardless of the type of design. 

Lemma 1 Fn{du) — > F{du) as n —> 00, almost surely for a/Z m G § . 

Proof: First, by definition of § we know that for all u G §, n„ — ?• 00 as n ^> cxd. 
Second, note that the point estimates can be written as 



1 " 

Fnidu) = F{du) + —yi{X, = du){Y, - F{du)). 

71. ^ ^ 



(3) 



Now, Mn = ~ du){Yi — F{du)) is a square integrable martingale 

with respect to the filtration J>i = cr(Xi, Yi, . . . , X„, y„). Its quadratic variation 



is: 



■u I — "'U- 



i=l 



4=1 



Therefore, due to the strong law of martingales (ref., (jShiravev 
theorem 4): 



19961 ). p. 519, 



1 " 

— yi(X,^du){Y^-F{du))^Q a.s. VmgS. 



Revisiting ([3]), the lemma's statement immediately follows. □ 



3 Interval-Design Convergence 

Note that with respect to allocation convergence, the space of possible configu- 
rations of S can be partitioned into three major subspaces or events A, B, C: 

• A: si = S2 = u* , 

• B : si < S2 and u* G S, 

• C : u* ^ §. 

Almost sure convergence is equivalent to stating that Pr(A) = 1. 

Theorem 1 (i) Dose allocations in interval designs converge almost surely to 
du' , if the latter maintains 



(a) Almost-sure convergence to d„» will also occur if F (di) > p + Ap2 
(meaning that u* ~ 1) or F {d„i) < P — Api (meaning that u* ~ m). 

Proof: (i) We begin by showing that Pr(C) = 0, which is equivalent to Pr {u* £ §) = 
1. We do it by contradiction, assuming w.l.o.g. that there is some specific level 
si > u* for which Pr (S*! — si > u*) > (in other words, that there are se- 
quences with a positive probability of occurring, in which beyond a certain 



F{du*) G (p- Api,p + Ap2) 



(4) 



and if du* is also the only level satisfying 



F{du') e [p- Api,p-H Ap2]. 



(5) 
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point only levels above the MTD are visited). From the theorem's assumptions, 
we know that F (d^j ) > p + Ap2. Due to Lemma [U this means that for n large 
enough and all sequences described by the conditioned event, 

Pr {f„ (4 J >p + Ap2\Si=si>u*} = l. (6) 

Given the interval design's transition rules, this means that eventually the next- 
lower level, ds-^-i, will be allocated following each visit to d^-^ with probability 1 
conditioned on the above eventQ It follows that (si — 1) G S, reaching a contra- 
diction. We conclude that there is no si > u* for which Pr(5i ~ si> u*) > 0, 
and therefore one cannot condition on such an event as was done in ([S]) - and 
similarly, no S2 < u* for which Pr (52 ~ S2 < u*) > 0. In terms of the partition 
of sequence space defined above, Pr {u* e S) = Pr (A U i?) = 1. 

Now we can assume that u* G §. Given the theorem's conditions and ac- 
cording to Lemma [TJ eventually for n large enough 

Pr {f„ G (p - Api,p + Ap2) I u* eS)= 1, 

and so upon the next visit to du* it will be repeatedly allocated with probability 
1. This means, that with probability one (conditional upon u* G S) there can 
be no other level in §. Therefore Pr(^) = 1, and the interval design converges 
almost surely. 

^Any monotonizing modifications to the -F's do not matter as n — >■ oo, since in that limit 
they are needed with probability zero. Or, if they involve a level not belonging to §, their 
impact tends to zero. 
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(ii) In the same vein, if all true F values of design levels are above or below 
the target interval, then with probability 1 the boundary level with F value 
closest to the interval (c?i or dm) belongs to §, and eventually the design will 
repeatedly hit upon the boundary condition mandating repetition of that level 
with probability 1 as well. □ 

The theorem's proof itself suggests what might happen in case its conditions 
are violated. Hence, the following two results are immediate. 

Corollary 1 (i) If no dose level satisfies ^ but p 6 [F (di) , F (dm)] , an in- 
terval design would eventually oscillate with probability 1 between the two doses 
whose F values straddle the target interval. 

(ii) If there is more than one level satisfying with probability 1 an interval 
design will converge to either of these levels. However, convergence to d* itself 
is not guaranteed. 

4 Point-Design Convergence 

The point design has a positive probability of not converging. We show this 
via a simple, yet generic counterexample: Assume that p < 1/2, F{du*) = P 
and F ((iu*-i) < P (all other levels matter little). The experiment uses cohorts 
of size fc > 1 and proceeds, as dose-finding trials often do, from below. Sooner 
or later du*^i is reached, and with high probability within a few cohorts we 
will have F{du*~i) < p, mandating escalation to . Now, suppose that 
the very first cohort at d^* is all toxicities; clearly the probability for that 
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occurring is positive. Then F (du*) = 1- Since p < 1/2, regardless of the value 
of it is now closer to target than F (du*) and it will be assigned. 

Moreover, since F ) = 1 monotonicity at d^* will not be violated, meaning 
that no monotonizing corrections can modify this point estimate, which will 
remain too far from p for the remainder of the experiment. Hence du* will never 
be assigned again. A simila r argu ment was made by Cheung in a Biometrics 



letter to the editor ( Cheung 



20021) 



5 Numerical Sensitivity Study 

5.1 Convergence of One-Parameter Designs 

How restrictive are the conditions outlined in Theorem [T]? One way to gauge 
this is to compare them with the conditions of Shen and O'Quigley's proo f 



1996 ) 



for CRM-like one-parameter frequentists designs (jShen and O'Quiglev . 
We now revisit its conditions in some detail. Beside straightforward conditions 
guaranteeing that the modeled dose-toxicity curve G (x, 0) can match the true 
curve F at least at one x value by changing the value of 6, the proof focuses 
on how well G fits F elsewhere. Being a one-parameter model, G cannot be 
guaranteed to do so simultaneously at more than one point. Moreover, the choice 
of this point uniquely determines 9. Suppose w.l.o.g. that G (du) ~ F (du), and 
call the resulting parameter value 9^. Then the level which, according to G (9^) 
appears to be the MTD, will be called the level "nominated" by d^ (since it will 
be allocated whenever G matches F at du)- The crucial and most restrictive 
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Shen-O'Quiglcy condition is that all levels nominate the true MTD. Hereafter 
we will refer to this convergence result as "CRM convergence" , even though it 
is in fact a proof for the convergence of an analogous frequentist design. 

Cheung and Chappell, in their interpretation of the proof, opine that this 
requires a very close match between G and F along the entire dose range 



( Cheung and Chappell 



20021 ). They go on to suggest that perhaps the Shen- 
O'Quigley conditions were too restrictive, and it might be enough for the MTD 
to nominate itself, and additionally doses below the MTD nominate higher doses 
than themselves, and vice versa. Thus, dose allocation might eventually be "fun- 
nelled" towards the MTD. The conjecture has not been proven. Conversely, it is 
clear when a one-parameter Bayesian design cannot be guaranteed to converge: 

1. When the MTD fails to nominate itself; or 

2. When other levels beside the MTD nominate themselves; or 

3. When the "funneling" conditions suggested by Cheung and Chappell are 
not met. 



5.2 Comparing Interval-Design and CRM Convergence 

We explored numerically the relative restricted-ness of the two sets of con- 
ditions. Since the convergence of both the interval design and CRM can be 
directly determined from the values of F at the dose levels, together with the 
interval endpoints and the parameters of G, there is no need to simulate ac- 
tual experiments. Rather, we simulated various scenarios of F on m = 5 

14 



and m = 10 dose sets, and examined whether the CRM and the interval 
design convergence conditions are met for each scenario. We chose the tar- 
get p ~ 0.3, the value most commonly used in Phase I cancer trials, which 
is the application for which both designs have been developed. For this tar- 
get, developers of the CCD interval design recommend the interval (0.2, 0.4) 
when 771 = 6; they provide no recommendation for other values of m. We also 
explored the narrower interval (0.25,0.35); hereafter we refer to the interval 
design in this simulation as "CCD". For CRM, we used the recently popular 
"power" model, in which G (c?i, . . . , dm] Q) = (pi, . . . ,Pmy^^''^\ with the p's 
being prior toxicity rates assigned to each dose. Experienced CRM designers 
do not choose these rates solely according to toxicity information knowledge, 
but mostly in order to ensure sensible small-sample behavior. A choice com- 
monly encountered in the field resembles a geometri cally-increasing seq uence. 



e.g. (pi, . . . ,prn) = {0.05, 0.1, 0.2, 0.4, 0.8} for m = 5 (jPisters et al 



20041) . 



In order to generate reasonably realistic scenarios without restricting our- 
selves to a given distribution family, and also in order to minimize the direct 
impact of arbitrary conscious choice upon F, we simulated increments of F in 
each scenario as a random Dirichlet vector. Dirichlet distribution parameters 
control the likelihood of generating various curves; these parameters themselves 
were randomly drawn out of a finite pool, producing a range of diverse, yet rea- 
sonably realistic F curves, which would be relevant for the dose-finding problem 
as defined here. Additionally, lower and upper bounds were placed on increments 
of F to exclude scenarios in which adjacent-dose toxicity rates are virtually in- 
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Simulated 5-Level Toxicity Scenarios Simulated 10-Level Toxicity Scenarios 




12345 2468 10 

Dose Level Dose Level 



Figure 2: Simluated dosc-toxicity curves. Random samples of 20 out of the 2500 
simulated scenarios, for m = 5 (left) and m = 10 (right). 



distinguishable, or spaced too far apart. Figure [5] shows a random sample of 
20 scenarios (out of 2500 used in the study) for each of to = 5 and to = 10. 
Scenarios were simulated and convergence ev aluated using the R language ver- 



sion 2.9.1 (|R Development Core Team 



. Additional details appear in the 



supplementary material. 

For CCD, we distinguish between the three possible convergence outcomes 
proven in Theorem 1 and Corollary [TJ 

• Convergence to the MTD guaranteed. The MTD is the only level in 
the interval, or the target is below/above the design dose range (column 
marked "Yes"). 

• More than one level in the interval, and hence only convergence to within 
the interval is assured, but not to the MTD itself (column marked "No: 
2+"). 
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• No level in the interval, and hence an asymptotic oscillating behavior is 
expected (column marked "No: 0"). 

For CRM, we distinguish between five possible outcomes: 

• Convergence to the MTD guaranteed: all levels nominate the MTD (row 
marked "Yes"). 

• "Soft convergence" : convergence not guaranteed by proof, but the Chcung- 
ChappcU "funneling" conditions are met (row marked "Funneling" ) . 

• Convergence not guaranteed: one of the three failure modes outlined ear- 
lier (rows marked "No: 0", "No: 2+" and "No Funneling"). 

Below are tables of simulation results for five and ten design levels. The 
full Shen-O'Quigley conditions for CRM convergence are only rarely met. The 
weaker "funneling" conditions are met in quarter of the m = 5 cases, but nearly 
half of the rn = 10 cases. A notable observation is that only two of the three 
CRM failure modes take place, at least in these simulations; hence, only four 
outcomes are tabulated instead of five. The missing entry is "No Funneling" : if 
funneling is violated, then always (in our simulation runs) one of the other two 
conditions is violated as well. 

Observing the CCD results, exact convergence to the MTD is guaranteed in 
a far larger number of cases than with CRM. Together with the multiple-level 
("No: 2+") cases, in the vast majority of simulated scenarios CCD is guaranteed 
to converge to within the specified interval. Comparing the narrower and wider 
interval design options, we see that the former performs better with more design 

17 



Table 1: Comparative theoretical convergence summary of CRM and CCD de- 



signs, for a diverse ensemble of numerically-generated scenarios, 
defined as the dose level whose true F value is closest to 0.3. 
the table are in percents. Row and column labels are explained 


The MTD was 
All numbers in 
in the text. 




m = 5 


CCD (width: ±0.1) 


CCD (width: ±0.05) 




No: 


No: 2+ 


Yes 


No: 


No: 2+ 


Yes 


CCD Margins 


7.6 


56.2 


36.2 


32.6 


13.4 


53.9 


CRM Margins 














No: 17.0 
CRM 5g 3 

"Funneling" 25.0 
Yes 1.7 


7.0 
0.3 
0.1 

0.2 


2.1 
46.4 
7.5 
0.1 


7.8 
9.6 
17.4 
1.4 


16.6 
9.4 
6.0 
0.6 


0.0 
13.4 
0.1 
0.0 


0.4 
33.6 
18.8 

1.1 


m = 10 


CCD (width: ±0.1) 


CCD (width: ±0.05) 




No: 


No: 2+ 


Yes 


No: 


No: 2+ 


Yes 


CCD Margins 


0.8 


92.2 


7.0 


4.6 


53.6 


41.8 


CRM Margins 














No: 13.4 
No: 2+ 41.4 
"Funneling" 44.4 
Yes 0.8 


0.0 
0.0 
0.4 
0.4 


11.5 
41.3 
39.4 
0.1 


1.9 
0.2 
4.7 
0.3 


3.1 
0.1 
0.9 
0.5 


2.9 
36.6 
14.1 

0.0 


7.3 
4.8 
29.4 
0.3 
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levels. This suggests that the density of dose levels should also play a role 
in determining the interval width, a point somewhat side-stepped by CCD's 
developers. 

Comparing the two design approaches by case, there is a strong associa- 
tion between the CRM's failure to converge due to multiple self-nominating 
levels, and CCD's failure to converge due to multiple levels inside the interval: 
practically all scenarios indicating the former, also indicate the latter (but not 
vice versa). The "funneling" scenarios are associated with CCD scenarios that 
converge to within the interval. 

6 Discussion 

6.1 Convergence and the Role of Simulation 

As mentioned in the introduction, the explosion in novel long-memory design de- 
velopment lacks an accompanying effort to prove design convergence. However, 
as pointed out in the introduction, the common thread between all dose-finding 
designs is a self-correction mechanism to concentrate treatments around target. 
If this mechanism is sound, then it should eventually converge to some station- 
ary behavior with desirable properties vis-a-vis the MTD. If convergence cannot 
be guaranteed under realistic conditions, then the self-correction mechanism it- 
self is suspect regardless of sample size. In other words, convergence should be 
viewed as a necessary condition for dose-finding designs (albeit not a sufficient 
one, since small-sample behavior does need to be examined separately). There- 
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fore, the study of convergence should play an larger role in the field of novel 
dose-finding designs. 

As to the use of simulation itself, we attempted to try and minimize the 
effect of direct human choice on the tested scenarios. Ivanova and coworkers 



( Ivanova et al 



20071) started along this direction, choosing F values out of an 
ordered uniform distribution. We believe our approach further expands the 
horizons for a distribution-free simulation study, and does succeed in sampling 
a sizable region of the space of distributions that would be considered realistic 
by researchers in the field. It might serve as an initial template for future 
benchmark comparative performance simulations between designs, of the type 
that is common in fields such as machine learning. 



6.2 Implications for Interval Designs and CRM 

The results summarized in Table 1 underscore Cheung and Chappell's observa- 
tion that the existing Shen-O'Quigley CRM convergence proof requires rather 
restrictive conditions. CRM convergence occurs only in a small fraction of sim- 
ulated cases, and we venture to suggest the conditions for it are only rarely 
met in practice. Thus, an accurate interpretation of the Shen-O'Quigley result 
seems to be that CRM's convergence under correct specification - which for one- 
parameter models is an immediate result of standard MLE convergence theorems 
- can be extended to very mildly misspecified modelsP The Cheung-Chappell 

■^One way to quantify the degree of mis-specification is by measuring the total variation 
distance between F and G, with 6 chosen such that _F = G at the MTD. With m = 5 the G 
curves satisfying the Shen-O'Quigley conditions are about half as far, on the average, from F 
as the other curves. With m = 10 the difference is approximately threefold on the average. 
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"funnelling" conditions seem far more realistic than the Shcn-O'Quiglcy condi- 
tions; perhaps this observation will serve as motivation for finding a proof that 
they indeed guarantee convergence. At this moment it is unclear whether such 
a proof is feasible, or what additional conditions it would require. 

The CCD interval design converged to the MTD fairly often. The prospects 
of convergence improve with a well-informed choice of interval width. Our theo- 
retical and numerical results suggest a far simpler approach to optimal interval- 
width choice than the int ensive multi-scenario small-sample numerical study by 



the method's originators (jlvanova et al 



20071) . Based on Theorem [1] study de- 



signers should aim to capture exactly one dose level in the interval, and erring 
towards more than one level is probably more desirable than capturing none. 
In the absence of prior scientific knowledge about the slope of F around target, 
a total interval width of 1/m would do. as long as researchers believe that all 
m levels have toxicity rates not too close to either or 1. In any case, even 
when CCD does not converge to the MTD - whether due to multiple levels in 
the interval, or none - one can still guarantee a predictable asymptotic behavior 
with respect to the pre-specified interval. This is not the case with parametric 
designs in general and with one-parameter CRM in particular. 



6.3 Convergence, Small-Sample Behavior and Simulation 

Convergence studies can also shed light upon designs' small-sample behavior. 
For example, the up-and-down designs mentioned in the introduction converge 
at a geometric rate: their short memory facilitates a very quick self-correction 
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mechanism. However, the self-correction is blunt: asymptotic behavior me- 
anders around target, typically spreading most allocations over 2 — 4 levelsP 
Long-memory designs converge (if they converge) much more slowly, at a root- 
n rate; this means that their self-correction, even at small samples, is also slow. 
The promised compensation is a perfectly sharp asymptotic allocation distri- 
bution, zooming in on the MTD itself. Regardless of design and proof details, 
this outcome hinges upon the precision of point estimates, whose convergence 
was demonstrated in Lemma [1] Unfortunately, simple arithmetic on Binomial 
probabilities suggests that for point estimates to be precise with high reliability 
requires many more trials than the typical dose- finding sample size of 10 — 40 
subjects, who are inevitably spread over several dose levels. 

In fact, the main difference between asymptotic and small-sample behav- 
ior, is that the latter is dominated by very imprecise Binomial point estimates. 
Thus, during an initial stage of the experiment, a long- memory design might 
point towards a level quite far from the MTD, if a large enough proportion of 
the individual-subject toxicity trials yielded "atypical" outcomes. In sampling 
terms, if the initial group of sampled toxicity thresholds can be seen collectively 
as an outlier, then long-term designs (both parametric and nonparametric) are 
led astray. When this happens, the long memory and its associated slow self- 
correction become liabilities rather than assets: point estimates are off, and 
they will now improve only gradually because the initial, "outlier" group of 
outcomes is still included in any subsequent estimate. Meanwhile, the design 

•'In spite of the blunt allocation distribution, up-and-down estimates do become sharper 
with time, since they rely on all the gathered information. 
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will insist upon collecting information at the wrong place, further slowing the 
self-correction mechanism. In practice, this positive-feedback reaction makes 
long-memory designs less robust to the experiment's first few observations, 
compared with short-memory designs. This phenomenon is unrelated to de- 
sign details (parametric or nonparametric) , and has been obse rved numerically 



for both CCD and CRM (|Oron and Hofj . 



2007 



Oron . 



20091 ). It underscores 



the two messages conveyed here: 1. The study of convergence properties can 
help explain small-sample behavior, and 2. Convergence is a necessary require- 
ment for a sound dose-finding design, but convergence alone is not sufficient to 
guarantee desirable small-sample behavior. 
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